Chemical name recognition with harmonized feature-rich conditional random fields

نویسندگان

  • David Campos
  • Sérgio Matos
  • José Lúıs Oliveira
چکیده

This article presents a machine learning-based solution for automatic chemical and drug name recognition on scientific documents, which was applied in the BioCreative IV CHEMDNER task, namely in the chemical entity mention recognition (CEM) and the chemical document indexing (CDI) sub-tasks. The proposed approach applies conditional random fields with a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context (i.e., conjunctions) features. Post-processing modules are also integrated, performing parentheses correction and abbreviation resolution. In the end, heterogeneous CRF models are harmonized to generate improved annotations. The achieved performance results in the development set are encouraging, with F-scores of 83.71% on CEM and 82.05% on CDI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of chemical and gene mentions in patent texts using feature-rich conditional random fields

This article describes the application of Neji, a text-processing and concept recognition framework, to the automatic recognition of chemicals and gene mentions in medicinal chemistry patents. We used conditional random fields models trained with a otimized set of features including linguistic, orthographic, morphological, dictionary matching and local context features, dictionary-matching, and...

متن کامل

WHU-BioNLP CHEMDNER System with Mixed Conditional Random Fields and Word Clustering

Our team participated in the Chemical Compound and Drug Name Recognition task of BioCreative IV. We used a mixed conditional random fields with word clustering to fulfillment this task. For one hand, we generate the word feature by word clustering and train the corpus with word feature to get one model. On the other hand, the training corpus is transformed to a new one in the reversed order of ...

متن کامل

A Survey on Machine Learning Techniques to Extract Chemical Names from Text Documents

The chemical name extraction has a great importance in the biomedical field. Named Entity Recognition is the subtask of information extraction that is used to identify named entities in the given data. There are various dictionary-based, rule-based and machine learning approaches available for Named Entity Recognition. Rule based techniques include hand written rules. In this paper an extensive...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Chemlistem - chemical named entity recognition using recurrent neural networks

Chemical named entity recognition has traditionally been dominated by CRF (Conditional Random Fields)-based approaches but given the success of WKH DUWLILFLDO QHXUDO QHWZRUN WHFKQLTXHV NQRZQ DV 3GHHS OHDUQLQJ ́ Ze decided to examine them as an alternative to CRFs. We present here three systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using ric...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013